Random Positive-Only Projections: PPMI-Enabled Incremental Semantic Space Construction

نویسندگان

  • Behrang Q. Zadeh
  • Laura Kallmeyer
چکیده

We introduce positive-only projection (PoP), a new algorithm for constructing semantic spaces and word embeddings. The PoP method employs random projections. Hence, it is highly scalable and computationally efficient. In contrast to previous methods that use random projection matrices R with the expected value of 0 (i.e., E(R) = 0), the proposed method uses R with E(R) > 0. We use Kendall’s τb correlation to compute vector similarities in the resulting non-Gaussian spaces. Most importantly, since E(R) > 0, weighting methods such as positive pointwise mutual information (PPMI) can be applied to PoP-constructed spaces after their construction for efficiently transferring PoP embeddings onto spaces that are discriminative for semantic similarity assessments. Our PoP-constructed models, combined with PPMI, achieve an average score of 0.75 in the MEN relatedness test, which is comparable to results obtained by state-of-the-art algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sketching Word Vectors Through Hashing

We propose a new fast word embedding technique using hash functions. The method is a derandomization of a new type of random projections: By disregarding the classic constraint used in designing random projections (i.e., preserving pairwise distances in a particular normed space), our solution exploits extremely sparse non-negative random projections. Our experiments show that the proposed meth...

متن کامل

Random Manhattan Integer Indexing: Incremental L1 Normed Vector Space Construction

Vector space models (VSMs) are mathematically well-defined frameworks that have been widely used in the distributional approaches to semantics. In VSMs, highdimensional vectors represent linguistic entities. In an application, the similarity of vectors—and thus the entities that they represent—is computed by a distance formula. The high dimensionality of vectors, however, is a barrier to the pe...

متن کامل

On the nature Of Semantic Similarity and it’S meaSuring with diStributiOnal SemanticS mOdelS

The paper describes our application of the distributional semantic model (DSM) method that we developed for The First International Workshop on Russian Semantic Similarity Evaluation (RUSSE) shared relatedness task. The model was trained, for the most part, on the data of the Russian National Corpus main subcorpus (around 200 mln tokens), and the resulting vector space was weighted according to...

متن کامل

ISA meets Lara: An incremental word space model for cognitively plausible simulations of semantic learning

We introduce Incremental Semantic Analysis, a fully incremental word space model, and we test it on longitudinal child-directed speech data. On this task, ISA outperforms the related Random Indexing algorithm, as well as a SVD-based technique. In addition, the model has interesting properties that might also be characteristic of the semantic space of children.

متن کامل

Incremental semantic scales by strings

Scales for natural language semantics are analyzed as moving targets, perpetually under construction and subject to adjustment. Projections, factorizations and constraints are described on strings of bounded but refinable granularities, shaping types by the processes that put semantics in flux.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016